CasOligo


CasOligo is a R package to search the 20nt gRNA-target-site oligonucleotide sequence within 18S rRNA gene for designing the taxon-specific gRNA used for CRISPR-Cas Selective Amplicon Sequencing (CCSAS, Zhong et al., 2020) to assess the eukaryotic microbiome of hosts (e.g. metazoans, plant). Taxon-specific gRNA would guide the Cas nuclease to cut 18S rRNA gene of desired hosts specifically, but not of protists and fungi. This results in a sequencing-library highly enriched in 18S amplicons from protists and fungi, allowing for high-resolution surveys of the taxonomic composition and structure of the eukaryotic microbes associated with the host. CCSAS provides a new way to obtain high-resolution taxonomic data for the eukaryotic microbiomes of plants, animals and other metazoa.

To facilitate application of CCSAS, we designed gRNA-taxon-sites and gRNAs for almost all metazoan and metaphyta taxa that are currently available at SILVA (Quast et al., 2003), creating a gRNA-taxon-sites database for researchers who want to apply to their own organisms for various purposes. Beyond that, the CasOligo package provides an oligonucleotide design function, Cas9.gRNA.oligo2 function, that can be used to design custom gRNA for any gene for which the sequence is known and there is a reference database, including genes encoding other regions of 18S rRNA (e.g. 16S, 23S or ITS), or metabolic genes (e.g. COX1). Thus, CCSAS makes it possible to study the genetic diversity of any gene in complex systems, including those that are rare, by removing any sequence that would otherwise dominate the data. The sequence-specific removal of any amplicon has a wide range of applications, including pathogen diagnosis, and studies of symbiosis and microbiome therapy.

   

Fig.1 Distribution of the number of sgRNA-target-sites across metazoans and plant taxa for designing taxon-specific and CRISPR-Cas9 compatible gRNA.

   

Features


   

How does the oligonucleotide-designing algorithm works?


   

Installation


To install the latest version from GitHub, simply run the following from an R console:

if (!require("devtools"))
  install.packages("devtools")
devtools::install_github("kevinzhongxu/CasOligo")

     

Dependancy


This package depends on the pre-installation of following R package:

   

Citation


 

If you use CasOligo in a publication, please cite our article in here:

Zhong KX, Cho A, Deeg CM, Chan AM & Suttle CA. (2020) The use of CRISPR-Cas Selective Amplicon Sequencing (CCSAS) to reveal the eukaryotic microbiome of metazoans. xxxx xx(xx): xxxx. https://www.biorxiv.org/content/10.1101/2020.06.02.130807v1

   

Get start


 

Example 1: Design the 20nt gRNA-target-site oligonucleotide

This is an example to design the 20nt gRNA-target-site oligonucleotide for gRNA of CRISPR-cas9 system to cut the 18S rRNA gene of host, but not of protists and fungi

#If you aim to cut the 18S rRNA gene of the host at V4 region that is flanked by primer set, TAReuk454FWD1 and TAReukREV3 (Stoeck et al., 2010), please use this cas9.gRNA.oligo1 function as it based on the reference database of that region.
cas9.gRNA.oligo1(inseq="Path/To/Your/Input_sequence_fasta_file.fasta", target="Taxonomic_group_of_a_host")


#If you do NOT want to predict the gRNA's target range among a host taxonomic group.
cas9.gRNA.oligo1(inseq="Path/To/Your/Input_sequence_fasta_file.fasta")


#If your input fasta file is with more than one sequence and you want to check the target range of host among these sequences and among these and all related sequences from SILVA.
cas9.gRNA.oligo1m(inseq="Path/To/Your/Input_sequence_fasta_file.fasta", target="Taxonomic_group_of_a_host")
cas9.gRNA.oligo1m(inseq="Path/To/Your/Input_sequence_fasta_file.fasta")


#If you aim to target another region of the 18S rRNA gene that is amplified by different primers, or any other genes, please use cas9.gRNA.oligo2() function and you need to generate your own reference database. 
cas9.gRNA.oligo2(inseq="/home/kevin/Desktop/data/human.fasta", refseq="Path/To/Your/Reference_database_file.fasta", target="Homo_sapiens")
cas9.gRNA.oligo2(inseq="/home/kevin/Desktop/data/human.fasta", refseq="Path/To/Your/Reference_database_file.fasta")

 

Example 2: Design the 20nt gRNA-target-site oligonucleotide for 18S sequence of pacific oyster

This is an example to design the 20nt gRNA-target-site oligonucleotide for gRNA of CRISPR-cas9 system to cut the 18S rRNA gene of pacific oyster Crassostrea gigas, but not of protists and fungi.


#First, we obtain the link for the 18S sequence of pacific oyster in fasta format (V4 region of the 18S rRNA gene flanked by the primers, TAReuk454FWD1 and TAReukREV3)
input_fasta_file <- system.file("extdata", "pacific_oyster_18S_V4.fasta", package = "CasOligo")

#To design gRNA for the oyster 18S sequence and predict the sgRNA's target-range among other "Crassostrea_gigas" sequences in SILVA.
cas9.gRNA.oligo1(inseq=input_fasta_file, target="Crassostrea_gigas")

#To design gRNA for the oyster 18S sequence and predict the sgRNA's target-range among other "Ostreidae" sequences in SILVA.
cas9.gRNA.oligo1(inseq=input_fasta_file, target="Ostreidae")

#To design gRNA for the oyster 18S sequence and predict the sgRNA's target-range among other "Mollusca" sequences in SILVA.
cas9.gRNA.oligo1(inseq=input_fasta_file, target="Mollusca")

#To design gRNA for the oyster 18S sequence, but if you do not want to predict the sgRNA's target-range among other taxonomic groups.
cas9.gRNA.oligo1(inseq=input_fasta_file)

 

Example 3: Retrieve the 20nt gRNA-target-site oligonucleotide sequence from database

We already made a database of gRNA-target-sites (Zhong et al., 2020) for almost all metazoans and plant species that are available in SILVA (Quast et al., 2003).

If you have an idea on which host taxon to cut and its name, then you can use search.db.byname function to retrieve the oligo.

#To sucessuffly search a database, the name of taxon should be same as Silva database
search.db.byname(query="Host_species or Host_taxonomic group", cas="Name_of_Cas")

search.db.byname(query="Homo sapiens", cas="Cas9")
search.db.byname(query="Salmon", cas="Cas9")
search.db.byname(query="Mollusca", cas="Cas9")
search.db.byname(query="Crassostrea gigas", cas="Cas9")

search.db.byname(query="Homo sapiens", cas="Cas12a")
search.db.byname(query="Salmon", cas="Cas12a")
search.db.byname(query="Mollusca", cas="Cas12a")
search.db.byname(query="Crassostrea gigas", cas="Cas12a")
  

If you want to know the cut detail of of this gRNA-target-site, then you can use search.db.byid function as follows.

search.db.byid(query="ID_of_the_gRNA-target-site", cas="Name_of_Cas")

search.db.byid(query="probe_022593", cas="Cas9")
  

 

References


Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

Stoeck, T. et al. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Mol. Ecol. 19, 21–31 (2010).

Zhong KX, Cho A, Deeg CM, Chan AM & Suttle CA. The use of CRISPR-Cas Selective Amplicon Sequencing (CCSAS) to reveal the eukaryotic microbiome of metazoans. xxx xx(xx): xxxx (2020). https://www.biorxiv.org/content/10.1101/2020.06.02.130807v1

 

License


 

This work is subject to the MIT License.

   

 


A work by Kevin Xu ZHONG